Day 10 - Sorting and reducing
34
$ seq 1 20 | sort -nr | head -n 5
20
19
18
17
16
The final command that I want to show you in this chapter appears often after sort and is called uniq.
Its job is that of removing duplicated lines, leaving only one occurrence. This command, however,
works comparing a line with the following one only, which is the reason why we run it after a sort.
The file examples.txt contains the word cat several times (because I’m not a cat lover, I’m a feline
worshipper. Guess what’s my favourite bash command). You can notice that the pure sort command
lists that word three times in a row. If you run
$ cat examples.txt | sort | uniq
though, you will see it listed only once. Not everybody hate duplicates, though (ask Gaius Baltar),
so uniq has several options that perform different tasks like for example printing only the duplicate
lines. At any rate, in my experience the default behaviour is the most useful one.
Is there anything that people like more than sorting? Oh yeah, and it is pizza! Oh, sorry, I must have
messed up my notes. What was I saying? Oh yes, what do we love more than sorting? Counting,
naturally!
So, uniq can compress a sorted text, removing duplicated lines, but counting them, giving as output
a nice report of the number of times a certain line appeared.
$ cat examples.txt | sort | uniq -c
1
1 007
1 aardvark
1 basilisk
1 beholder
1 Big Bad Wolf
1 bull
1 C-3PO
3 cat
1 corn dog
1 Cyborg 009
1 direwolf
2 dog
1 dryad
1 Dug the Dog